Unsupervised Semantic Action Discovery from Video Collections
نویسندگان
چکیده
Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic “storyline” of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks. 1 O. Sener Cornell University, Ithaca NY 14853, USA E-mail: [email protected] A.R. Zamir Stanford University, Stanford CA 94305, USA E-mail: [email protected] C. Wu Cornell University, Ithaca NY 14853, USA E-mail: [email protected] S. Savarese Stanford University, Stanford CA 94305, USA E-mail: [email protected] A. Saxena Brain of Things Inc, Cupertino CA 95014, USA E-mail: [email protected] 1 First version of this paper appeared in ICCV 2015. This extended version has more details on the learning algorithm and hierarchical clustering with full derivation, additional analysis on the robustness to the subtitle noise, and a novel application on robotics.
منابع مشابه
Action Change Detection in Video Based on HOG
Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...
متن کاملOn Linking Heterogeneous Dataset Collections
Link discovery is the problem of linking entities between two or more datasets, based on some (possibly unknown) specification. A blocking scheme is a one-to-many mapping from entities to blocks. Blocking methods avoid O(n) comparisons by clustering entities into blocks, and limiting the evaluation of link specifications to entity pairs within blocks. Current link-discovery blocking methods exp...
متن کاملUnsupervised Alignment of Actions in Video with Text Descriptions
Advances in video technology and data storage have made large scale video data collections of complex activities readily accessible. An increasingly popular approach for automatically inferring the details of a video is to associate the spatiotemporal segments in a video with its natural language descriptions. Most algorithms for connecting natural language with video rely on pre-aligned superv...
متن کاملTop-down Analysis of Low-level Object Relatedness Leading to Semantic Understanding of Medieval Image Collections
The aim of image understanding, which is a long standing goal of computer vision, is to develop algorithms with which computers can advance to the semantic content of images. One ability of such algorithms would be the automatic discovery of relations between different objects in large collections of images. To analyze this relatedness we present an unsupervised and a semi-supervised approach f...
متن کاملExtracting Latent Attributes from Video Scenes Using Text as Background Knowledge
We explore the novel task of identifying latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms. We develop and test several largely uns...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1605.03324 شماره
صفحات -
تاریخ انتشار 2016